Skip to content

Added PyGithub package to invoke github api calls#54

Open
banginji wants to merge 8 commits intomainfrom
use-github-package
Open

Added PyGithub package to invoke github api calls#54
banginji wants to merge 8 commits intomainfrom
use-github-package

Conversation

@banginji
Copy link
Collaborator

@banginji banginji commented Mar 7, 2026

1- Except get_file and resolve_commit_hashes the rest use the package
2- get_file uses requests since iter_chunks helps with reading large files and using the package wasn't straightforward to handle large files
3- resolve_commit_hashes uses requests because the package didn't give a value add

This handles #47

@github-actions
Copy link

github-actions bot commented Mar 7, 2026

Detected 3 changes to dependencies in Poetry lockfile

From base 251a4cb to head aaf47b6:

Added pygithub (2.8.1)
Added pyjwt (2.11.0)
Added pynacl (1.6.2)

(3 added, 0 removed, 0 updated, 68 not changed)

Generated by diff-poetry-lock 1.0.1.dev0

@banginji banginji requested a review from colindean March 8, 2026 22:31
Copy link
Collaborator

@colindean colindean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this shaves… 5 lines of code, but we basically gain some standardization and gets pagination for free where we weren't using it (and may not actually need it…).

How do you feel about this? Is it worth the switch?

Comment on lines +74 to +76
issue = self.repo.get_issue(int(self.s.pr_num))
issue_comment = issue.get_comment(comment_id)
issue_comment.edit(f"{MAGIC_COMMENT_IDENTIFIER}{comment}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Hmmmmmmmmmmmm this is tripling the requests, eh?

I think we can use the data we already have to initialize an Issue and save the first and maybe also do it for IssueComment.

issue_comment = github.IssueComment(id=comment_id)

but this might explicitly need a URL to be set, too. We have the info to build that URL…

issue_comment = github.IssueComment(
    id=comment_id,
    url=f"{self.s.api_url}/repos/{self.s.repository}/issues/comments/{comment_id}"
)

could extract that URL building to a function…

def build_issue_comment_url(self, comment_id: int) -> str:
    return f"{self.s.api_url}/repos/{self.s.repository}/issues/comments/{comment_id}"

issue_comment = github.IssueComment(
    id=comment_id,
    url=build_issue_comment_url(comment_id)
)

This reduces the reliance on the library, which we're increasing drastically in this PR, but saves a few extra requests and thus a few tens of milliseconds of compute and API limits and whatnot. Probably worth it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Yes I'll look at customizing it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified with a single api call

Comment on lines +182 to +184
issue = self.repo.get_issue(int(self.s.pr_num))
issue_comment = issue.get_comment(comment_id)
issue_comment.delete()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Same deal, and likely same deal in other places where it takes a few more HTTP requests to get what we need when using the library… when we already have some or all of the metadata that the library is going to get in those requests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with package api helpers and a single api call

Comment on lines +186 to +193
class Headers(Enum):
"""Enum for github api headers."""

JSON = "application/vnd.github+json"
RAW = "application/vnd.github.raw"

def headers(self, token: str) -> dict[str, str]:
return {"Authorization": f"Bearer {token}", "Accept": self.value}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is this still used? I think PyGitHub might provide this. Check out github/Auth.py, perhaps github.Token to wrap the token and github.Consts for the MIME types.

How we build the headers dict for the few requests we are managing… might still be this method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the headers are used in the two get_file and resolve_commit_hashes that I didn't switch but yes I can take a look at Auth.py to reuse it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enums have been deleted now since the package api helpers handle it

@banginji
Copy link
Collaborator Author

banginji commented Mar 9, 2026

So, this shaves… 5 lines of code, but we basically gain some standardization and gets pagination for free where we weren't using it (and may not actually need it…).

How do you feel about this? Is it worth the switch?

I felt the library wasn't that worth to switch over to because I feel we have to customize each use case we've either to reduce the network calls or just that it doesn't support some use cases yet like the two calls we make: get_file and resolve_commit_hashes which still uses requests

We could still incorporate it for standardization though since if we decide to go with the lib, we can set it up once after customization and reap its benefits

@colindean
Copy link
Collaborator

it doesn't support some use cases yet like the two calls we make: get_file and resolve_commit_hashes which still uses requests

Would it be worth suggesting those two methods to PyGitHub with a PR?

@banginji
Copy link
Collaborator Author

it doesn't support some use cases yet like the two calls we make: get_file and resolve_commit_hashes which still uses requests

Would it be worth suggesting those two methods to PyGitHub with a PR?

For the get_file, I don't think it would because when I was refactoring the calls, I read that the github contents api has a cap of allowing the downloads of files < 1MB so the addition of stream wouldn't help if we're targeting larger files (our current setup has the same limitation of 1MB). It also looks like for larger files 1MB - 100MB, we'll have to invoke the git_get_blob api
So if we're trying to handle it, we could do something like:

pygithubs' get_contents -> fail?(because file size > 1MB) -> get_git_blob -> fail?(because file size > 100MB) -> get_contents (get the download_url) -> make requests call and stream to get the file

As for the resolve_commit_hashes, we'll be using the graphql api using the requester from the library in my refactor so I think that'll covered

@banginji
Copy link
Collaborator Author

With my most recent changes, I've replaced all the github api calls with the package except get_file which I can take a look at in a separate pr to implement the following flow

pygithubs' get_contents -> fail?(because file size > 1MB) -> get_git_blob -> fail?(because file size > 100MB) -> get_contents (get the download_url) -> make requests call and stream to get the file

@banginji banginji force-pushed the use-github-package branch from 8da0c92 to 4338b50 Compare March 11, 2026 19:15
1- Except get_file the rest use the package
2- get_file uses requests since iter_chunks helps with reading large files and using the package wasn't straightforward to handle large files

Signed-off-by: banginji <7316646+banginji@users.noreply.github.com>
Replaced graphql api call with vanilla requests call since using the package doesn't have a value add

Signed-off-by: banginji <7316646+banginji@users.noreply.github.com>
Currently, all changes are not being read by the action so fixing it by adding the ref with a depth of 0 to get all commits in the pr branch to checkout and run the tool

Signed-off-by: banginji <7316646+banginji@users.noreply.github.com>
Fixed the action to checkout all changes to the poetry.lock file alone in pr and run the tool against the main branch

Signed-off-by: banginji <7316646+banginji@users.noreply.github.com>
Signed-off-by: banginji <7316646+banginji@users.noreply.github.com>
1- Remaining get_file call that still relies on requests might be addressed in a future pull request

Signed-off-by: banginji <7316646+banginji@users.noreply.github.com>
Signed-off-by: banginji <7316646+banginji@users.noreply.github.com>
@banginji banginji force-pushed the use-github-package branch from f486b3e to ce98977 Compare March 13, 2026 04:23
Current setup is throwing a 403 when attempting to update an existing comment so added some token permissions to the action

Signed-off-by: banginji <7316646+banginji@users.noreply.github.com>
@banginji
Copy link
Collaborator Author

After I merged in #56, it fixed the sha resolution in the diff pr comment but it is failing the 'diff-poetry-lock safe for forks' since it is a pull_request_target event which is looking at the base branch, main, but the env var for 'ref' has been changed to 'github_head_ref' instead of what it was previously 'github_ref'
github_head_ref -> branchs' ref
github_ref -> depends on the event type: pull_request refers to the branch ref and pull_request_target refers to the main branch ref
Will have to figure out how to handle this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants